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PREFACE 


The International Conference on Computer, Engineering, Law, Education and Management 
(ICCELEM 2017)” held on 28 - 29 th September 2017, in collaboration with Association of 
Scientists, Developers and Faculties (ASDF), an International body, at The Westin Chosun 
Seoul, Seoul, South Korea. 

ICCELEM 2017 provides a chance for academic and Industry professionals to discuss the recent 
progress in the area of Computer, Engineering, Law, Education and Management. The outcome 
of the conference will trigger for the further related research and future technological 
improvement. This conference highlights the novel concepts and improvements related to the 
research and technology. 

The technical committee consists of experts in the various course subfields helped to scrutinize 
the technical papers in various fields, support to maintain the quality level of the proceedings of 
conference which consist of the information of various advancements in the field of research and 
development globally and would act as a primary resource of researchers to gain knowledge in 
their relevant fields. 

The constant support and encouragement from Dr. S. Prithiv Raj an, ASDF Global President, 
Dr. P. Anbuoli, ASDF International President helped a lot to conduct the conference and to 
publish the proceedings within a short span. I would like to express my deep appreciation and 
heartfelt thanks to the ASDF team members. Without them, the proceedings could not have been 
completed in a successful manner. I would like to express my sincere thanks to our management, 
student friends and colleagues for their involvement, interest, enthusiasm to bring this 
proceeding of the conference in a successful way. 


Dr. K Kokula Krishna Hari, 


Chief Editor cum Convener, 
General Secretary, ASDF International, London, United Kingdom 
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Embedding the Multiple Linear Regression Model to 
Monitor Student Performance in the Flexible Digital 

Learning Environment 

Benilda Eleonor V Comendador 1 

'Polytechnic University of the Philippines, Anonas St., Sta. Mesa, Manila, Philippines 

Abstract- This study focused on providing a Decision Support System (DSS) that integrates Multiple Linear Regression (MLR) model to monitor 
student performance in thejlexible digital learning environment. The author carried out series of experiments in order to evaluate the performance 
and usefulness of the generated models. MLR was adopted in the development of the Learning Analytics Decision Support System. The developed 
application predicts the performance of the university portal users which may help the Distance Education (DE) students succeed in the blended 
learning approach being provided by the DE educators. 


I. Introduction 

Nowadays, instructional technologies are on transition from mobile learning to ubiquitous learning wherein educational materials 
are accessible anytime, anywhere in any form (text, video and audio) to all educational stakeholders via eLearning platforms. 
With this innovation, students and enrollment in Distance Education (DE) courses became attractive for most learners. 
Conversely, the report on course drop out and failure rates is more incessantly increasing in this mode of learning. In this light, 
the academe need to develop tools and methods that will explore data coming from the eLearning software which can support 
teachers and students to take action based on the evaluation of educational data. 

Policy makers and administrators should include analytics, user modeling, user profiling and clustering, domain modeling, 
relationship mining and data visualization to unveil outcome-oriented actionable insights from specific learning behaviors [1]. 
Consequently, some educational institutions (e.g. University of the Philippines Open University, Mindanao State University, 
California State University, Monash University in Australia) were implementing Learning Management System (LMS) to manage 
the courses offered in the Internet [3]. In 2015, eLearning industry reported that 74% of the companies currently use LMS and 
Virtual classroom, webcasting and video broadcasting [4]. Some of the most popular open source LMS includes Edmodo and 
Modular Object-Oriented Dynamic Learning Environment (Moodle) [8]. Nevertheless, based on the studies, MOODLE was the 
most recommended LMS because the administration and control can be handled by the institution to do further analytics such as 
tracking web logs of students for further monitoring of their progress and other activities [2]. According to Romero et. al., the 
application of data mining in e-leaming is not much different than any other application area [9]. However, there are some 
important issues that make data mining in e-learning different than in the others such as (1) data; (2) objective and (3) techniques. 
In other web-based systems the data used is normally a simple web server access log, but in e-learning there is much more 
information available about the student’s interaction such as details on each online assessment task (assignment, quiz, discussion 
forum and chat) [6]. Conversely, Picciano suggested that LMS’s should provide constant monitoring of student activity whether 
there are responses, postings on a discussion board, accesses of reading material, completions of quizzes, or some other 
assessment. Thus, university should analyze the data gathered from the LMS users which are stored in the web server to discover 
knowledge that will enhance the students’ online experience [7]. ECAR Working Group (2015) suggested that educators can tap 

This paper is prepared exclusively for International Conference on Computer, Engineering, Law, Education and Management 2017 [ICCELEM 2017] which 
is published by ASDF International, Registered in London, United Kingdom under the directions of the Editor-in-Chief Dr. K Kokula Krishna Hari and 
Editors Dr. Daniel James, Dr. Saikishore Elangovan. Permission to make digital or hard copies of part or all of this work for personal or classroom use is 
granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation 
on the first page. Copyrights for third-party components of this work must be honoured. For all other uses, contact the owner/author(s). Copyright Flolder 
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2017 © Reserved by Association of Scientists, Developers and Faculties [www. ASDF.international] 

















International Conference on Computer, Engineering, Law, Education and Management 2017 


2 


more dynamic data produced from a range of instructional technologies (such as LMS event log data, electronic gradebook data, 
attendance data, library data, etc.) for learning analytics which when combined with traditional measures—allow for a more 
nuanced and personalized analysis [5]. Some of the challenges of implementing data mining and learning analytics, include high 
cost of collection, storage, development of algorithms, interoperable administrative and learning systems (systems/data types). 
As such, their report recommends researchers to combine the data types, with acceptable validity, privacy, and ethical standards 
applied, for improved predictive power [1]. 

Apparently, in one of the student workshops conducted by the University of Lincoln, the students cited some ideas on capabilities 
they would like to see in a learning analytics application. They included notifications on grades and progress toward objectives; 
the ability to give immediate feedback to lecturers and professors in order to improve the course; and reading list functionality 
that presents metrics on how students engage with the texts [10]. 

Currently, little research has been conducted that focus on university portal learning analytics that will lead to the prediction of 
academic performance of the DE students. Thus, this study focused on providing a Decision Support System (DSS) that 
integrates Multiple Linear Regression (MLR) model of data mining for portal providers and users to analyze and predict the 
performance of distance students scientifically. 

II. Framework of the Study 

Fig.l illustrates the framework of the study. It consists of four (4) major phases 1) development of a student performance 
predictive model. 2) testing and implementation of the predictive model, 3) development of the decision support system to 
identify at risk-students for early intervention and attrition prevention and 4) evaluation of the developed software by the 
respondents in terms of the following: functionality, usability, reliability and portability of the output. 
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Figure 1. Framework of the Study 

During the first phase, the author utilized the data sets from the university Academic Institutions Management Systems (AIMS) 
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After which, the author performed steps to preprocess the data then converted the extracted data into the required format by 
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Waikato Environment for Knowledge Analysis (WEKA) tool. Also, to generate the students’ performance prediction model, the 
author conducted series of experiments to evaluate the appropriate classification for predicting students’ final rating based on 
their usage data in the university portal. During the next phase: testing and implementation of the model, the author identified 
interesting rules and patterns for decision-making. The author repeated the data mining and the pattern analysis process if she 
considered that the results were not remarkable. For the third phase of the study, the predictors to Student Online Performance, 
the data mining techniques and decision support system concept were integrated to develop the software as shown on Fig. 1. 

The independent variables in this study were the records of student’s performance in online assessment tasks posted by the course 
specialists in the university portal. The expected output was the developed student performance model with MLR. With this 
model, it may give students considerable time and opportunity for early interventions to improve his scholastic performance and 
for the Distance Education providers to lessen the drop-out rate. During the fourth phase of the study, the developed software 
was evaluated by the respondents in terms of functionality, usability, reliability and portability of the output based on the ISO 
9126 Quality Model for Software Evaluation. The author used the statistical tools such as Weighted Mean, Ranking Method and 
Percentage to summarize and analyze the respondents’ evaluation on the developed software. 

III. Results and Discussions 
A. The Multiple Linear Regression Model 

The original database for MLR was divided into two using the 80:20 rule - the training and validation dataset which consisted of 
19 instances and the test dataset which consisted of 7 instances. Equation 3.0 shows the MLR equation generated using Weka in a 
10-fold cross validation and a confidence factor of 0.25 without pre-processing of attributes. This was named as MLR Model A in 
this study. 

MLR Model A in equation form is: 

Final Rating — 0.2082 x Assign_l + 0.1987 x Assign_2 + 0.205 x QUIZ_1 + (3-0) 

0.1963 x QUIZ_2 + 0.1986 * Assign_3 + -0.4605 

Fig. 2 described the attributes selection output. It can be gleaned that Log_Count got the highest influence followed by 
Mat_Access_Count, Activity_Rating and Final_Grade. 

——— Run information ——— 

Evaluator: weka.attributeSelection. CfsSubsetEval 
Search: weka.attributeSelection.BestFirst -D 1 -N 5 

Relation: LMS_Prediction_Merged_v6-weka.filters.unsupervised.attribute.Remove-Rl-8,10,12,14,16-17,19-20 

Instances: 248 ; Attributes: 5 

Log count, Mat_Access_Count, Activity_Ratng, Exam_Points, Final_Grade 

Evaluation mode: 10-fold cross-validation 
——— Attribute selection 10 fold cross-validation seed: 1 ——— 
number of folds (%) attribute 
10(100%) 1 Log Count; 1( 10%) 2 Mat_Access_Count; 

0( 0%) 3 Activity_Rating; 10(100%) 5 Final_Grade 

Figure 2. Attribute Selection Output for MLR 

From the processing of attributes making use of the same validation and confidence factor, the MLR model generated is shown in 
Equation 4.0. This was named as MLR Model B in this study. 

MLR Model B in equation form is: 


Exam_Points — 


1 x Activity_Rating + 2 x Final_Grade + 0 


(4.0) 
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Table I shows the error measures of the two MLR models generated with and without preprocessing of attributes. It compares 
the fitting of the models as to the differences between the observed values and the model’s predicted values. The correlation 
coefficient of MLR Model A is higher than Model B. On the other hand, MAE and RMSE of MLR Model B was greater than MLR 
Model A’s. 


Table I Error Measures of the MLR Models 



MLR Model A 

MLR Model B 

Correlation Coefficient 

0.9999 

0.9188 

Mean Absolute Error (MAE) 

0.1046 

8.9507 

Root Mean Squared Error (RMSE) 

0.1383 

11.6134 


Fig. 3 examines the difference between the performance of the two MLR model as far as evaluation on test set is concerned. It 
could be gleaned from the figure that MLR Model B had a higher MAE and RMSE than that of the MLR Model A. 



Figure 3. Performance of the MLR Models based from Evaluation on Test Set 

B. Evaluation of the Developed Software by the Respondents 

As the developed software will be used for wider DE learners and providers, the author asked select active portal users to 
evaluate the system in terms of functionality, usability, reliability and portability of the output. After exploring the 
demonstration, the author encouraged them to fill-out the online survey form. 

There were one hundred ninety-one (191) online portal users who filled-out the survey form. Below is the highlight of the 
information that were gathered from the participants. 

Table II described the summary of the evaluation of the developed software as perceived by the respondents. The ratings in this 
table indicates that the respondents rated the developed software as “Moderately Acceptable” which signifies that it is functional, 
usable, reliable and portable for the DE stakeholders. 

Table ii. Summary of the evaluation of developed software by the respondents 


Parameters 

Total Weighted Mean 

Interpretation 

Functionality 

4.43 

Moderately Acceptable 

Usability 

4.34 

Moderately Acceptable 

Reliability 

429 

Moderately Acceptable 

Portability 

4.32 

Moderately Acceptable 

Overall Mean 

4.34 

Moderately Acceptable 
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In terms of system functionality, the respondents found that the system can shut out access from people who are not part of the 
course. It provides online submission of assignments where the result can be evaluated by the professor then recorded in the 
database. This is an indication of an effective integration management that should be maintained by the system. In terms of system 
usability, the system rated as “Moderately Acceptable” of most of the respondents. This indicates the versatility of the developed 
software when it comes to usability as it gives opportunity to the users to recapitulate, retrieve and interact with the system 
whenever the user desires using his available technology gadget. In terms of reliability, the respondents rated all questions as 
“Moderately Acceptable”. It signifies that the developed system provides confidentiality in each user account and it has more 
secured delivery and distribution of information to its intended users. These indicate that it generates notifications from 
authorized users only; provides consistent result and response correctly when encountered failure. In terms of portability it is 
rated as “Moderately Acceptable”. This indicates that the system can be accessed from one gadget (smart phone, tablet, laptop 
and personal computer) to another. Thus, it allows the users (OUS Administrator, course specialists and learners) to view course 
materials and online assessment tasks with one login password in any computer or mobile device which promotes higher system 
flexibility to the DE learners who were predominantly part-time students but fulltime employees. The ratings on this parameter 
exhibit the portability of the system which also promotes robustness of the developed model for predicting the online 
performance of the DE learners. 


IV. Conclusion 

The results demonstrated that the generated MLR model can be harnessed to develop the Learning Analytics Decision Support 
System which may provide powerful educational tool that can analyze and predict the performance of the learners in the flexible 
digital learning environment. During the testing and simulation of real institutional data, the developed software displayed the 
same output with that of the two reliable application programs the Microsoft Excel and WEKA. The respondents rated the 
developed software as “Moderately Acceptable” with overall mean of 4.31 which signifies that it is functional, usable, reliable and 
portable for the Distance Education (DE) stakeholders. For future research, the author may concentrate on greater number of 
instances using the other variables and may explore other data mining algorithms. 
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Abstract- This study developed a sentiment analyzer that utilizes sarcasm detection. A multilingual language model was used to class fy 
complaints with the use of Probabilistic Model with Kneser Ney smoothing to improve the system’s accuracy. Emoticons were also detected to 
include in the sarcasm detection of a service complaint. The researchers made used oj dataset from servicing company or government agency’s 
service complaint to ensure that the research would be based on real life data. The system gained precision f 89%, recall of 92 % and FI -Score 
of 90%. The study shows that the accuracy of the Sarcasm Detection increased upon the integration of Emoticon Detection and resulted to the 
increase in sentiment analysis of the system. 

Keywords: Emoticon Detection, Language Model, Sentiment Analysis, Sarcasm Detection, Sarcasm and Emoticon Detection 

I. Introduction 


Sentiment analysis of review sites and online forums has been a popular subject for several years in the field of natural language 
processing. [1] Before the internet awareness became widespread, many of people used to ask their friends or neighbors for 
opinion of a good electronic products or a food before buying it or going for it. With the growing availability and popularity of 
opinion-rich resources such as online review websites and personal blogs, new opportunities and challenges arise as people now 
can, and do, actively use information technologies to seek out and understand the opinions of others. Unfortunately, these 
opinion rich resources are available in unstructured format. It has encouraged the analysts to develop an intelligent system that 
can automatically categorize or classify these text documents. 

Sarcasm is a form of art that is marked using sarcastic language and is intended to make its victim the buff on contempt of 
ridicule. In text mining, automatic detection of sarcasm is considered a difficult problem [2] and has been addressed in only a few 
studies. Sarcasm can be used to transform the polarity of an apparently positive or negative utterance into opposite [3]. It was 
suggested by the study of (Sagum et.al.), that sarcasm can be used to increase the accuracy of a sentiment analyzer. Detecting 
sarcasm and emoticon in text is a complex process. To recognize sarcasm, tone recognition must also be considered since people 
express their feelings with high and low pitches [4]. To recognize emoticon, we must know the following: First, emoticons 
represent body language, which is nonverbal. Second, there has been a lack of sufficient methods for the analysis of emoticons 
and need to recognize the pattern or identify rather what is the emoticon that has been used [5]. 

Considering these, the researchers developed a system that will recognize and able to analyze emoticons and sarcastic statements. 
The researchers strongly agree that it will help to accurately detect emotions, emoticons that were used were based on the article 
Smiley Face and Text Emoticon Symbols by Beal [6]. The system will be able to classify the polarity of a complaint whether it is 
positive, negative or neutral. The sentiment analyzer will be using sarcasm detection and emoticon detection as its feature to 
improve the accuracy of the analyzer, it will analyze service complaints of customers. 
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is published by ASDF International, Registered in London, United Kingdom under the directions of the Editor-in-Chief Dr. K Kokula Krishna Hari and 
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II. Related Works 

The study of Wang etal. [3], explains that automatically detecting sarcasm in twitter is a challenging task since sarcasm transforms 
the polarity of an apparently positive or negative utterance into its opposite. Previous work focuses on feature modelling of the 
single tweet, which limit the performance of the task. The sarcasm detection problem is modelled as a sequential classification 
task over a tweet and his contextual information. 

The first study to addresses the use of emoticons to recognize sarcasm was done by Walther et.al [7]. Their participants were 
asked to read emails including positive or negative messages, followed by a smiley face a sad face a wink face ;-) or no 
emoticon. Messages were ambiguous as to whether they were intended literally or sarcastically. Furthermore, the most sarcastic 
condition was a positive verbal message with a wink. However, this message—emoticon combination was not significantly more 
sarcastic than a positive message with a smile, a sad face, or nothing at all. Therefore, the researchers concluded that winks do not 
actually connote greater sarcasm than other emoticons. 

The study of Derks etal. , [8] ran a similar study to Walther etal., examining the same set of emoticons, but included a neutral 
message condition (in addition to positive and negative), and the participants in the study were the recipients of the emails. In 
contrast to Walther etal. , the work od Derks etal. showed that emoticons enhanced the valence of a message. Additionally, and 
again in contrast to Walther etal. messages with a wink face were rated as significantly more sarcastic than those without an 
emoticon. 

Hogenboom established that in order to exploit emoticons in automated sentiment analysis, the researchers first need to analyze 
how emoticons are typically related to the sentiment of the text they occur in. Interestingly, this positioning of emoticons 
suggests that it is typically not a single word. [8] 

Lexical feature-based classification as first of the types to detect sarcasm is text properties such as unigram, bigram, n-grams, etc. 
are classified as lexical features of a text. Authors used these features to identify sarcasm, introduced this concept for the first 
time and they observed that lexical features play a vital role in detecting irony and sarcasm in text. [9] Riloff, et. al. [10] used a 
well-constructed lexicon-based approach to detect sarcasm and for lexicon generation they used unigram, bigram and trigram 
features. Barbieri et. al. considered seven lexical features to detect sarcasm through its inner structure such as the intensity of the 
terms. [11] 

Pragmatic feature-based classification as the second type to detect sarcasm uses symbolic and figurative text in tweets is frequent 
due to the limitations in message length of a tweet. These symbolic and figurative texts are called pragmatic features (such as 
smiles, emoticons, replies, @user, etc.). It is one of the powerful features to identify sarcasm in tweets as several authors have 
used this feature in their work to detect sarcasm. Pragmatic features are one of the key features used by Kreuz & Caucci [9] to 
detect sarcasm in text. The study of Carvalho et. al. [12] used pragmatic features like emoticons and special punctuations to 
detect irony from newspaper text data. 

After thorough understanding on the different studies on sentiment analysis it was then highlighted that sarcasm can help with the 
accuracy of sentiment analysis, likewise the emoticons can give impact in the accuracy of sarcasm detection. The researchers 
made used of these features to look on to the accuracy of a sentiment analyzer once sarcasm and emoticon detection were taken 
into consideration. 


III. Discussion of the System’s Design 

The process of the system starts with pre-processing activities (see Figure 1). These tasks include Tokenization, Sentence Splitter 
and Text Normalization. The researchers used Python’s NLTK for this module. 

The next phase is the Polarity Classification Module, this module consists of the ff: 

a. Initial Polarity Classifier, it distinguishes the polarity of the text whether it is positive, negative or neutral by 
using the classification of bag-of-words and Language Model. It sets out the polarity for each token in the 
sentences. 
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b. Language Model, assigns a probability to a set of string based on its occurrence in text prior processed. The 
model used N-Grams and a Kneser—Ney smoothing algorithm to improve the probabilities of each gram. The 
Training Data is a collection of classified complaint sentences based in polarity such as, positive, negative or 
neutral. The model used up to tri-gram. The reason is that the data that have been used are composed of Tagalog, 
English and Taglish complaints and the structure of its sentences can be processed correctly using this gram. 

c. Sarcasm Detection, this will be implemented after setting the polarity in each text. The algorithm 
implemented in this process was Probabilistic Model and Regular Expression Model. Using the model, the 
probability of the word of being negative is from -1.0-0.4, for neutral -0.399-0.399, and for positive it was 0.4- 
1 . 0 . 

e.g. [‘Napakagaling’,0.642,Pos], [‘ninyo’,-0. 1 32,Neu], [‘gumawa’, 0.539, Pos], [‘ng’, 0.539, Neu], [‘daan’, 
-0.205, Neu], [V , -0.132, Neu], [‘Lubak’, -0.742, Neg], [‘Lubak’, -0.742, Neg], [parin’, 0.143, Neu], 
[‘:D’, 0.053,Neu] 

c.l. Lexical Feature Classifier, will be the first to detect sarcasm in text properties using unigram, 
bigram, trigram, n-grams, etc.each of which are classified as lexical features of a text. 

c.2. Hyperbole Feature, composed of Intensifier will search for keywords that denote intensity or degree 
of a given text. These keywords will be used to increase/decrease the positivity or negativity of a certain 
word/s. The Regular Expression Model was used by the researcher to detect sarcasm by observing patterns 
that denote sarcastic complaints from a given text. Interjection such as: “wow”, “aha”, “yay” etc. has a higher 
chance of being sarcastic, and series of punctuation marks will be used as additional cues for sarcasm 
detection. 

d. Emoticon Detection, will add a value to its polarity if an emoticon placed on a given text. It will add a positive, 
negative or neutral weight on its context. The emoticon lexicon contains all the emoticons included in the 
American English twitter corpus study that will be used in this research. In the example, the emoticon polarity’s 
detected shifted to a positive polarity. 

e.g. [‘:D’, Neu] -> [‘:D’, Pos] 

e. Final Polarity Classifier, computes for the overall polarity weights in the Sarcasm and Emoticon Detection. 
Adding to it, a rule based was implemented to check if a sentence’s sentiment was Positive and it was followed by 
a negative emoticon and if a sentence’s sentiment was Negative and was followed by a positive emoticon, it will 
have a corresponding result of being sarcastic. Lastly, it will output the classified service complaint whether it is 
sarcastic or not sarcastic and if an emoticon existed or not. In the example, the first sentence which resulted to a 
low probability of being positive and the second sentence scoring a higher probability of being negative, resulting 
to shift the total context of the sentence into a negative sentiment. Now the negative sentence is followed by a 
positive emoticon. Sarcasm is detected by applying the rule of negative sentence followed by a positive emoticon. 

Sarcasm Detection 

Napakagaling nyo gumawa ng daan. Lubak lubak parin — NEG 
Emoticon Detection 


:D 


POS 
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NEGATIVE (Sentiment) + POSIsTIVE (Emoticon) — Sarcastic 

IV. Presentation of Results 


The performance of the system was measured using the F measure or FI score (Eq. 1). 


;precision * recall 

FI Score = 2 * -— 

precision + recall 


FI score refers to the harmonic mean of precision (Eq. 2) and recall (Eq. 3). 


( 1 ) 


Precision = 


True Positives 

(True Positives + False Positives') 


Recall = 


True Positives 

(True Positives + False Negatives ) 


( 2 ) 

(3) 


The researchers made use of the table of Dan Jurafsky of Stanford University to classify the output of the stream. It uses True 
Positive (TP), the number of sentiments classified positively correct by both system and human, True Negative (TN), the number 
of sentiments that classified positively incorrect by both system and human, False Positive (FP), the number of sentiments 
classified correctly by system but Incorrect by human, and False Negative (FN), the number of sentiments classified correctly by 
human but incorrect or neutral by system. 


Table 1 shows the result of the evaluation of the sentiment in every criterion. The precision of Complaints with Emoticon gained 
100%, this is because the dataset that was fed in the system were correctly analyzed by the system. The system recorded 67% for 
recall due to a complaint that was not recognized by the system and 80% for FI—Score. For Complaints with Sarcasm and 
Emoticon, it gained 100% high in recall, due to 258 complaints that were correctly recognized, 100% in precision due to one 
complaint that was not recognize and 100% in FI-Score due to the computed precision and recall. The result of the system’s 
evaluation under the criteria of Plain Complaints for precision, recall and FI-score were 84%, 80% and 82% respectively. And 
lastly for the Complaints with Sarcasms gained 34%, 42% and 38% respectively for precision, recall and FI Score. This is due to 
unclean dataset that was fed in the system during evaluation phase, also the data set lacks complaints that includes sarcastic 
features. 


Table 1: Summary of Results according to Criteria 


Criteria 

TP 

TN 

FP 

FN 

PRECISION 

RECALL 

FI SCORE 

Complaints with Sarcasm 

38 

552 

73 

52 

34% 

42% 

38% 

Complaints with Emoticon 

2 

712 

0 

1 

100% 

67% 

80% 

Complaints with Sarcasm and Emoticon 

258 

456 

1 

0 

100% 

100% 

100% 

Plain Complaints 

290 

298 

54 

73 

84% 

80% 

82% 


Table 2: Overall Systems Performance in Recognizing Sentiments 


PRECISION 

89% 

RECALL 

92% 

FI SCORE 

90% 


By using the sarcasm detection with integration of emoticon detection, in the developed system, the results shows in Table 2 
was the Overall Performance of the system for sentiment analysis regardless of the criteria. Based on the study of Ebola etal 
[13], that generated a rating system for the parameters: precision, recall and FI score, the system’s accuracy rate is “Very 
Good”. 
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V. Conclusion 

In this study, language model was utilized to determine the sentiment of service complaints. Sarcasm detection was employed 
through pattern extraction, to improve the overall accuracy of the system. Lastly, the researchers included emoticon detection 
to gain much higher accuracy in detecting sarcasm in service complaints. 

The system was able to serve its purpose: to detect sarcasm and integrate it to the system to accurately analyze a sentiment. If it 
will be compared to the previous study (Sagum, De Vera, Lansang, Narciso, & Respeto, 2015) Ref [12], where smoothing 
algorithm was used for the Language Model, the system’s precision and recall were quite higher than the previous one. 
However, there are numerous words that the program was not able to recognize correctly. Human raters typically agree 70% is a 
Pass rating when it comes to sentiment analyzers. [14] [13] [4] Thus, a sentiment analyzer that has 82% accuracy rate is quite 
doing well as humans do in analyzing sentiments but still has a lot to learn for improvement. The results of the study showed that 
the system was able to correctly determine the detection in service complaints. It was rated “Very Good” in performing 
sentiment analysis. 

The results of the research furthermore showed a marginally dense FI-Score. FI-Score ranged from 50-100%, which is 
considerably satisfactory to excellent rate, but can be improved through a larger set of implementation data. The researchers 
observed that the FI-Score and Accuracy of the criteria Complaints with Sarcasm and Emoticon is higher than the criteria of 
Complaints with Sarcasm. The researchers achieved their goal to integrate the emoticon detection to sarcasm detection for better 
recognition of sarcasm and improved the previous study of Sagum et. al. [4] Researchers found out that the emoticon detection is 
effective in achieving higher accuracy when integrated in sarcasm detection. 

VI. Recommendations 

The implementation set was a set of mixed complaints with different sentiments so the results were not that accurate. The 
researchers recommend testing the system using a different set of complaints for each sentiment: Positive, Negative and Neutral 
for much more accurate results of the capability of the sentiment analyzer to classify sarcastic complaints. 

The researchers recommend using Chen and Goodman’s Modified Kneser-Ney. This smoothing method is tested to be the best 
smoothing method for these kinds of problems. Probabilistic Model with Modified Kneser-Ney Smoothing needs a much large of 
training data. 

It is also recommended for the future researcher to use other combination of features in sarcasm detection such as Slang 
detection, Irony detection and Rant detection to improve the degree of accuracy of sarcasm detection in a hybrid manner, 
thereby increasing the accuracy of sentiment analyzer. 
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ABSTRACT: Dog and cat is considered as beloved pets of most people. But these animals are also prone to different diseases such as colds, ticks 
and fleas, worms, and fungal infections. Early detection leads to early prevention and cure. Detecting diseases at early stage will enable to 
overcome and treat them appropriately. Identfying the treatment accurately depends on the method that is used in diagnosing the diseases. This 
study entitled “Diagnostic App for Cats and Dogs Diseases using Neuro — Fuzzy Algorithm” a developed mobile-based application could 
recognized dogs and cats diseases using neuro-Juzzy algorithm. It aimed to test the accuracy performance of the neuro-fuzzy algorithm on the 
mobile app. The researchers use Android Studio as the coding platform and fava as the programming language. The developed mobile app run on 
android version KitKat or better versions. The researchers used experimental method of research which aimed to evaluate the accuracy performance 
of the mobile with neuro-fuzzy algorithm in diagnosing dog’s diseases, cat’s diseases in terms of precision, recall, andf-measure. The accuracy 
performance rate of the system was measured through series of experimentation and with the help of our expert. The researchers used 1 71 for dogs’ 
diseases and 124 diseases for cats to test the performance accuracy of the said mobile app with neuro-fuzzy algorithm. The study attained the 
overall accuracy performance rate f the mobile app with neuro-fuzzy with of 87% in diagnosing dogs’ diseases and 90% in diagnosing cats’ 
diseases. The overall accuracy performance is 88.50%. Hence, the researchers concluded that the Diagnostic App for Cats and Dogs Diseases 
using Neuro — Fuzzy Algorithm is very high. The developed mobile App can diagnosed dogs’ and cats’ diseases and could advise what proper 
treatment could be done for every illnesses. It could be a guide for the pet’s owner in taking good care of their loving animal. We, therfore 
recommend to use the said mobile app in diagnosing dogs’ and cats’ diseases. 

Keywords: Neuro Fuzzy, Diagnostic Mobile App, Pet App 

1. Introduction 

According to Veterinary Pet Insurance, there are 10 common diseases that affects dogs and cats. Common diseases in dogs and 
cats are skin allergies, ear infection, non-cancerous skin mass, skin infection, arthritis, vomiting/upset stomach, 
periodontitis/dental disease, diarrhea/intestinal upset, bladder or urinary tract infection, soft tissue trauma (bruise or 
contusion), excessive thyroid hormone, upper respiratory infection, and lymphoma. (Association, 2016). Preventive healthcare 
involves a multi-faceted approach that includes veterinary evaluation of your pet's overall health and risks of disease or other 
health problems. 

Most of the pet owners may turn to internet and try to diagnose and treat their pet’s conditions themselves (Zander, 2016). 
However, different sites provide thousands of different advices stored online and finding the right one can be difficult especially if 
it is coming from different sources. Applications on market like Easy Vet is made especially for Veterinarians (Technologies, 
2015). 
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Diagnosing the diseases and identifying the right treatment can remedy the pets’ illnesses. Detecting at early stage of pets diseases 
prevented serious illness and could be treated properly (Patra, Sahu, & Mandal, 2010). A Diagnosis Expert System (DExS) could 
be served as a guide in identifying diseases and suggests methods for curing diseases. 

According to the study of (Andrews, et al., 2015), veterinarians believe there is a strong desire for mobile technology in 
veterinary medicine and the use of this technology will allow them to practice more effectively. Results showed mobile devices 
are prevalent and widespread among veterinarians with more than sixty percent surveyed strongly agreed mobile technology will 
advance patient care, client communication, and improve access to clinical data and medical literature. 

2. Background and Its Problem 

A Care for Animal Organization believes that animals are wonderful part of people’s lives, they bring joy, happiness, and give 
unconditional love. They bring sorrow when they leave, but most of all, they leave with cherished memories of true friendship.” 
(CFA, 2017) 

Pet health care is one thing pet owners want to get right (Haight, 2013). Regular veterinarian visits are not only good for your 
pet, but can be good for the pet owner’s wallet as well. Early detection of illness, like food allergies and urinary tract infections 
can help prevent or cure these problems, before they become serious or extremely expensive. 

Current research like Virtua-Vet done by (Floresca, Jaymalin, Taguba, & Zapanta, 2010) is a big help in preventing diseases on 
cats and dogs, this research uses Natural Language Understanding Techniques in determining keywords on user input questions. 
However, data available are very limited since it only focused in determining common diseases. This research is very useful to the 
pet owners who seek expert advice for their pet but unable to afford the potential costs for veterinary consultation. 

Almost same study has been found but it focus on the diagnosis of diabetes. In the study of (Morales & Tomines, 2016), they 
developed a mobile application that runs on an android operating system. It calculates a user’s risk in having diabetes through a 
set of risk factors and symptoms. This study aims to provide self-awareness and early detection of diabetes to avoid further 
complications. This study uses fuzzy logic algorithm and genetic algorithm. 

This research aimed the following objectives: 

1. Developed a mobile application that can diagnosed dogs and cat’s diseases. 

2. Implemented the Neuro-Fuzzy Algorithm in developing the mobile application. 

3. Tested the accuracy performance of the Neuro-Fuzzy Algorithm in the developed mobile application to diagnosed dogs 
and cat’s diseases in terms of Precision, Recall and F-Measure. 

4. Attained the overall accuracy performance of the Neuro-Fuzzy Algorithm in the developed mobile application to 
diagnosed dogs and cats diseases 

This study will be a great help to pet owners and veterinarians. Pet owners specifically those who have dogs and cats as their 
home pets will be the one who will highly benefited in this system. It can help them to save money because they don’t need to 
visit a veterinary clinic in case their pets are sick. For veterinarians, this developed system will help them to serve as a tool in 
diagnosing pet’s diseases. It also makes their work easier when giving a diagnosis about pet disease. 

Future Researchers who want to make a study about an android based expert system in diagnosing a pet disease can use this study 
as their basis. This study can also be improved by other researchers for better development of the system. In the field of 
Computer Science, this study is beneficial because due to the continuous trend of technology today, having an expert system in 
diagnosing different pets’ diseases will be a great contribution. 

3. Methodology 
3.1. Research Methodology 

This study used experimental research design. The experimental research approach is a collection of research designs which use 
manipulation and controlled testing to understand causal processes. Generally, one or more variables are manipulated to 
determine their effect on a dependent variable. This is an experiment where the researcher manipulates one variable, and 
control/randomizes the rest of the variables. It has a control group, the subjects have been randomly assigned between the 
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groups, and the researcher only tests one effect at a time. It is also important to know what variable(s) you want to test and to 
measure. 

Experiments are conducted to be able to predict phenomenon. Typically, an experiment is constructed to be able to explain 
some kind of causation. Experimental research is important to society - it helps us to improve our everyday lives. (Blakstad, 
2008). The researcher to maintain control over all factors that may affect the result of an experiment. In doing this, the 
researcher attempts to determine or predict what may occur. 

3.2. System Architecture 

In the system architecture that was presented in Figure 6, the first part is the user must select the type of pet, choose from list of 
symptoms and answers the question. All the selected symptoms and the answers in the Q & A will then go to the neuro-fuzzy 
model. The output of the developed system will be the recognized name of disease, its description and suggestion. 
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T 


j 




Select Type of pet 


7 



Figure 2 — System Architecture 

4 . Results and Discussions 


Table 4.1 - DOGS' DISEASES 

Name of Disease 

True Value 

TP 

FP 

FN 

Precision 

Recall 

F - Measure 

Canine Parvovirus 

18 

15 

5 

3 

75% 

83% 

79% 

Kennel Cough 

17 

15 

2 

2 

88% 

88% 

88% 

Distemper 

17 

15 

0 

2 

100% 

88% 

94% 

Demodectic Mange 

17 

16 

4 

1 

80% 

94% 

86% 

Sarcoptic Mange 

17 

15 

3 

2 

83% 

88% 

86% 

Leptospirosis 

17 

14 

6 

3 

70% 

82% 

76% 

Ehrlichiosis 

17 

13 

1 

4 

93% 

76% 

84% 

Pyoderma 

17 

14 

0 

3 

100% 

82% 

90% 

Ear Mites 

17 

17 

0 

0 

100% 

100% 

100% 

Seborrhea 

17 

15 

1 

2 

94% 

88% 

91% 

AVERAGE 

88% 

87% 

87% 
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Table 4.1 above is the depicted the performance of the mobile app in diagnosing common dog’s illnesses in terms of precision, 
recall and F-Measures. Earmites got the rate of 100% in terms of Precision, Recall and F-measure. Diseases Distemper and 
Pyoderma got the 100% rating in terms of Precision, while the second highest in terms of recall is Demodectic Mange disease. 
Distemper got the second highest score in F measure with a rate of 94%. Leptospirosis got the lowest rate of 70% and 76% in 
precision and F-Measure respectively. The Ehrlichiosis disease obtained the low rate of 76% in terms of Recall. The average for 
precision recall and measure of the mobile app for diagnosing dog’s diseases is 88%, 87% and 87% respectively. 


Table 4.2 - CATS' DISEASES 

Name of Disease 

True Value 

TP 

FP 

FN 

Precision 

Recall 

F - Measure 

Urinary Tract Infection 

21 

19 

S 

2 

79% 

90% 

84% 

Poisoning 

21 

16 

4 

5 

80% 

76% 

78% 

Feline Worms 

21 

19 

0 

2 

100% 

90% 

95% 

Fungal Dermatitis 

21 

21 

1 

0 

95% 

100% 

98% 

Ear Mites 

20 

19 

0 

1 

100% 

95% 

97% 

Respiratory Infection 

20 

18 

2 

2 

90% 

90% 

90% 

AVERAGE 

91% 

90% 

90% 


The performance of the mobile app in diagnosing common cat’s illnesses in terms of precision, recall and F-Measures showed in 
table 4.2. Feline worms and Ear mites with the rate of 100% are precisely identified by the mobile app while Fungal Dermatitis 
got 100% rate in Recall. The highest F-Measure is Earmites with the rate of 97%. The performance of the mobile app in 
diagnosing cat’s diseases is 91%, 90% and 90% for precision, recall and F Measure respectively. 

5. Conclusions and Recommendations 

Table 5.1 shows the overall accuracy performance of the Mobile App with the Neuro-Fuzzy Logic Algorithm in diagnosing the 
dogs’ and cats’ diseases. Having an accuracy of 87% in diagnosing dogs’ diseases and an accuracy of 90% in diagnosing cats’ 
diseases, the researchers attained an overall accuracy of 88.50%. Also, the Neuro-Fuzzy Logic Algorithms performs well in 
determining the pets’ diseases. Hence, researchers concluded that the accuracy of the system in diagnosing a disease is very high 
[Artigo et al. 2015]. 


Table 5.1 - Overall Accuracy Performance 

PET 

Percentage 

Dog’s Diseases 

87.00% 

Cat’s Diseases 

90.00% 

Average 

88.50% 


The developed mobile App with Neuro-Fuzzy Algorithm can diagnosed dogs’ and cats’ diseases and could advise what proper 
treatment could be done for every illnesses. It could be a guide for the pet’s owner in taking good care of their loving animal. We 
therefore recommend to use the said mobile app in diagnosing dogs’ and cats’ diseases 

Lastly the researchers also recommend the following: 

1. Provide if necessary a specific/unique symptom for every disease to correctly recognize a disease. 

2. Put an algorithm that will automate the adding of rules based on the patterns provided. 

3. Make the symptoms of every disease more specific and should be understand by the user. 

4. For future studies, researchers may improve the system and test the significant difference of the experts’ diagnosis and 
systems’ diagnosis. 
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Abstract: In this digital era, it is very important to understand the consumer needs while dealing with large volume of data. In this paper, we 
focus on consumer queries and complaints. It is a difficult task to manually sit and arrange the queries and complaints in forums or online 
discussion sites related to a specfic topic. We propose a method which automatically classfies the queries posted by consumer to its correct class. 
The system does this classfication by using a technique called Long Short-Term Memory(LSTM). The LSTM network has the capability of 
learning long-term dependency features directly from the dataset without any manual ffort. The model showed considerable accuracy when tested 
with validation data. 

Keywords: Artificial Intelligence; Deep Learning; Feed Forward Network; Recurrent Neural Network; Long Short-Term 
Memory. 


INTRODUCTION 

Machine learning algorithms need pre-defined features to work. It is a very difficult task to identify salient features from data. 
Domain data knowledge is very essential for applied machine learning. The process of transforming data into features is called 
feature engineering. The process of feature engineering is a time-consuming method. In deep learning, the neural network will 
learn the features automatically from raw data. 

Feed Forward Network (FNN) is a type of Artificial Neural Network (ANN) where the information goes in forward direction 
only. The simple FNN has no hidden layers. In case of FNN with one perceptron, the computed output will be the sum of the 
product of their weights. When we step back and look at the data, we will understand the pattern of that data. By storing these 
patterns, we can predict the next sequence by just seeing the previous sequence. 

Recurrent Neural Network (RNN) stores information in the memory over time. The vanishing gradient problem in RNN makes 
it difficult to store long term dependencies. The network is trained using backpropagation algorithm. It uses chain rule that gives 
derivatives or partial derivatives of a function. 


RNN requires complex architecture than non-recurrent networks. The chain rule requires lots of computation. The output of 
RNN is not only used for computing recurrent value but also for computing next value for time periods. In deep neural 
networks, there are lot of hidden layers. The fundamental flaw of recurrent neural network is the number of multiplications 
required to compute the updated weights. The computed coefficients or weights in the past hidden layers are small numbers. So, 
it’s hard for RNN to learn from the past. Consider this sample sequence A saw B, B saw C, C saw D. In this example, we need to 
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predict the next sequence after ’A’. ’A’ will strongly vote for 'saw* and 'B' will vote for comma. The word ’saw’ has equal chances 
of predicting B, C or D. So, there are chances to make wrong prediction. To predict correctly, we need to see what happened in 
the previous steps. 

LSTM is a type of RNN with a set of gates to control the flow of information. The gates will select and forget information when 
it enters the memory. The on/ off gates will decide what to release as prediction and what to keep internal. 

The dataset used for this work is US Consumer Finance Complaints. It is about issues people experienced in marketplace. The 
'issue* and 'sub issue' columns in the data shows the problems faced by consumers. The product column shows products like 
mortgages, student loans, payday loans, debt collection, credit reports, and other financial products and services. Each record or 
sequence is a combination of 'issue' and 'sub issue' column. The product column corresponding to each record is taken as class 
label. 


RELATED WORK 

Oguzhan Gencoglu [1] proposed a method to categorize the messages from Finland's largest online health forum. It is to reduce 
the manual effort in managing messages in the forum. He used a Naive Bayes classifier to classify messages into 16 categories. 

The Search result diversification enables the modern-day search engines to construct a result list that consists of documents that 
are relevant to the user query and at the same time, diverse enough to meet the expectations of a diverse user population. 
However, all the queries received by a search engine may not benefit from diversification. Sumit bhatia, Cliff Brunk and Prasenjit 
Mitra [2] proposed an idea to analyze web search queries and classify those queries into one of the classes. They achieved Strong 
classification results for this classifier. 

Lorenxo A Rossi and Omprakash Gnawali [3] analyzed the discussion threads from coursera forums. They investigated several 
language independent features to classify the discussion threads based on the types of the interactions among the users. The 
features related to structure, popularity, temporal dynamics of threads are extracted. 

Bernard J. Jansen and Danielle Booth [4] proposed a methodology to classify automatically Web queries by topic and user intent. 
This technique can be used for real time query classification of web searches. 

D. Irazu Hernandez,Jansen Parth Gupta, Paolo Rosso and Martha Rochagy [5] proposed a method to automatically extract 
features from corpora and analyzed the distribution of features and used Naive Bayes and SVM to classify them. 



Fig. 1. Workflow of Model 

Kristof Coussement, Dirk Van den Poel [6] introduced a technique to improve complaint-handling strategies through an 
automatic email-classification system that distinguishes complaints from non-complaints. This methodology reveals linguistic style 
differences between complaint emails and others. 
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SOLUTION APPROACH 

The workflow of the model is shown in figure 1. The approach of sequence classification model is explained below. 

Dataset Pre-Processing: Tokenization and stop word removal are the common pre-processing steps. Each record is converted 
into unit-gram tokens. Keras Tokenizer API is used for tokenization and other basic filtering of text. The most common words 
are removed from the raw text. Other unwanted symbols and numbers are also removed from the data using regex operations. 

Sequence Processing: The first step is to transform each record to sequences. A vocabulary is created based on tokens. Each 
word in the dictionary is represented with a unique number. The next step is to pad the sequence length to the defined size. If the 
sequence length is smaller than the defined size, zeros will be added to pad the sequence. We can discard it if the size is higher 
than maximum sequence length. 

Label Encoding: The algorithm will not be able to read class labels. The class labels are transformed into an array of numbers. 

Word Embedding: The process of representing words in a continuous vector space based on position of words. This 
representation gives semantic similarity between words. The distributed representation of words are given as an input to the 
embedding layer. 

LSTM Network: The model is defined by giving the number of memory neurons, activation function etc. We used SoftMax as 
the activation function. The total dimension represents the features. These features are converted into memory units. It is a fully 
connected network. LSTM network learns what to select and forget from the features. The model is then complied by defining 
the optimization algorithm and loss function. Adam optimizer algorithm is used in the network. It is then fitted to the model. 
The model is evaluated with the validation data. 

Overfitting is the main problem in LSTM networks. The network will not be able to predict for unseen data. Our dataset may 
have thousands of parameters or dimensions. In this case, the parameters will try to adjust with the noise in the data. Then, the 
training accuracy will be high and out of sample data gives low accuracy. Adding dropout to the data will assign zeros to a 
percentage of data. This will happen for each epoch. Adding drop out layers can reduce the overfitting in LSTM networks. The 
loss functions are used by the optimization algorithm in every epoch to update the weights in every epoch. To predict categories, 
we have specific loss functions in keras library. The Hyper parameter tuning includes tuning of batch size, epochs, learning rate, 
activation functions, dropout layers, number of neurons etc. 

RESULTS 

This is an on-going work. The number of total records is 555957. A sample of 500 instances is taken from each class and a test set 
is generated. The validation score was 62.2 when tested with validation data. The accuracy of the model can be increased. The 
hyper-parameter tuning is going on to increase model’s prediction accuracy, we have to evaluate the model with different 
parameters using Grid search process. 


CONCLUSION 

In this work, the main focus is automatic classification of complaints and queries in internet forums or sites. The usual machine 
learning classification problems needs pre-defined features or manual intervention is needed to create features from dataset. The 
future plan is to optimize the model and maximize the accuracy. Many predictive modeling problems of sequence classification 
can be solved using this method. 
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